Text Extraction From Documents


Text extraction from documents is the process of extracting text data from scanned documents or images.

Large Language Model and Formal Concept Analysis: a comparative study for Topic Modeling

Add code
Feb 02, 2026
Viaarxiv icon

Inferential Question Answering

Add code
Feb 01, 2026
Viaarxiv icon

Toward Autonomous Laboratory Safety Monitoring with Vision Language Models: Learning to See Hazards Through Scene Structure

Add code
Jan 31, 2026
Viaarxiv icon

MiNER: A Two-Stage Pipeline for Metadata Extraction from Municipal Meeting Minutes

Add code
Jan 30, 2026
Viaarxiv icon

OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models

Add code
Jan 29, 2026
Viaarxiv icon

Youtu-Parsing: Perception, Structuring and Recognition via High-Parallelism Decoding

Add code
Jan 28, 2026
Viaarxiv icon

CitiLink: Enhancing Municipal Transparency and Citizen Engagement through Searchable Meeting Minutes

Add code
Jan 26, 2026
Viaarxiv icon

Typhoon OCR: Open Vision-Language Model For Thai Document Extraction

Add code
Jan 21, 2026
Viaarxiv icon

PDFInspect: A Unified Feature Extraction Framework for Malicious Document Detection

Add code
Jan 19, 2026
Viaarxiv icon

Taxonomy-Aligned Risk Extraction from 10-K Filings with Autonomous Improvement Using LLMs

Add code
Jan 21, 2026
Viaarxiv icon